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1.  Introduction 

With  chip  size  reaching  I  million  transistors,  the  need  for  high- 
level  design  of  circuits  becomes  compelling.  The  main  stum¬ 
bling  block  in  the  development  of  design  methods  for  VLSI  al¬ 
gorithms  is  to  find  an  interface  that  provides  a  good  separation 
of  the  physical  and  algorithmic  concerns.  Among  the  physical 
issues,  timing  is  the  most  critical,  since  it  is  not  only  essential 
to  the  real-time  behavior  of  a  circuit,  but  also  to  its  logical 
correctness  if  synchronous  techniques  are  used. 

Synchronous  techniques  are  detrimental  to  the  use  of  high- 
level  design  methods  because  they  don’t  “scale  well” :  a  circuit 
may  cease  to  function  correctly  when  its  feature  sizes  are  scaled 
down  to  smaller  dimensions.  Further,  with  the  increasing  size  of 
circuits,  it  becomes  more  and  more  difficult  to  distribute  safely 
a  clock  signal  across  a  chip,  and  the  restrictions  attached  to 
wire  lengths  in  order  to  maintain  certain  timing  properties  add 
extra  complication  to  the  already  difficult  layout  problem. 

For  all  those  reasons,  self-timed  techniques  (as  defined  in 
[10])  are  particularly  attractive  for  high-level  VLSI  design  [9]. 
We  propose  a  synthesis  method  for  self-timed  circuits  in  which 
the  computation  is  initially  described  as  a  set  of  communicat¬ 
ing  processes  in  the  notation  of  [3],  which  is  similar  to  C.A.R. 
Hoare’s  CSP  [2]  but  augmented  with  the  probe  construct.  This 
first  description  is  the  reference  solution,  which  has  to  be  proved 
correct.  The  program  is  then  compiled  into  a  self- timed  circuit 
by  applying  a  series  of  semantics-preserving  transformations. 
Hence  the  circuit  obtained  is  correct  by  construction. 

Unlike  most  silicon  compilation  methods  and  hardware  de¬ 
scription  languages,  the  method  leads  to  efficient  circuits.  It 
has  been  applied  with  “hand  compilation”  to  a  series  of  diffi¬ 
cult  self-timed  design  problems,  such  as  distributed  mutual  ex¬ 
clusion,  fair  arbitration,  routing  automata,  with  great  success. 
Actually,  the  method,  applied  by  a  person  in  a  mechanical  way, 
will  typically  produce  better  results  than  the  most  experienced 
designers  can  produce.  The  main  reason  for  the  efficiency  of 
the  method  is  that,  rather  than  going  in  one  step  from  the  pro¬ 
gram  notation  to  the  circuit,  the  designer  applies  a  series  of 
transformations  to  the  original  program.  At  each  level  of  the 
transformation,  powerful  algebraic  manipulations  can  be  per¬ 
formed  leading  to  important  optimizations  in  terms  of  speed  or 
area. 

We  shall  first  present  the  program  notation  and  the  VLSI 
operators  that  constitute  the  “object  code” .  We  then  describe 
the  four  steps  of  the  compilation  and  illustrate  the  method  with 
one  sizeable  example,  the  construction  of  a  stack.  We  shall  con¬ 
clude  that  this  technique  can  be  used  for  high  quality  and  high 
complexity  designs,  fully  automated  from  a  provably  correct 
high-level  description.  (For  a  more  complete  description  of  the 
method,  see  [4],  [5],  [6],  and  [7].) 


2.  The  program  notation 

The  language  used  for  the  high-level  description  is  close  to 
C.A.R.  Hoare’s  CSP[2].  We  give  only  a  very  informal  defini¬ 
tion  of  the  constructs  used  in  this  paper. 

i)  b  |  stands  for  b  :=  true ,  b  |  stands  for  b  :=  false . 

ii)  The  execution  of  the  selection  command  (generalized  IF- 
statement)  [Gi  — ►  S\  |  . . .  |  Gn  — ►  Sn),  where  G\  through 
Gn  are  Boolean  expressions,  and  S\  through  Sn  are  pro¬ 
gram  parts,  (Gj  is  called  a  “guard”,  and  G*  — ►  Si  a 
“guarded  command” )  amounts  to  the  execution  of  an  arbi¬ 
trary  Si  for  which  Gi  holds.  If  -i(Gi  V . . .  VGn)  holds,  the 
execution  of  the  command  is  suspended  until  (Gi  V. .  .VGn) 
holds. 

iii)  For  atomic  actions  x  and  y ,  “ar,y”  stands  for  the  execution 
of  x  and  y  in  any  order. 

iv)  [G]  where  G  is  a  Boolean,  stands  for  [G  —►  skip] ,  and  thus 
for  “wait  until  G  holds”.  (Hence,  “ [G];  5” and  [G  -+  5] 
are  equivalent.) 

v)  *  [S]  stands  for  “repeat  S  forever” . 

vi)  From  ii)  and  iii),  the  operational  description  of  the  state¬ 
ment  *[[Gi  — ►  Si  |  . . .  |  Gn  -+  Sn]]  is  “repeat  forever:  wait 
until  some  G,  holds;  execute  an  Si  for  which  G,  holds” . 

Communicating  processes 

A  concurrent  computation  is  described  as  a  set  of  processes 
communicating  with  each  other  by  communication  actions  on 
channels  (no  shared  variables).  When  no  messages  are  transmit¬ 
ted,  communication  on  a  channel  is  reduced  to  synchronization 
signals.  The  name  of  the  channel  is  then  sufficient  for  identify¬ 
ing  a  communication  action. 

If  two  processes  pi  and  p2  share  a  channel  named  X  in 
pi  and  Y  in  p2,  at  any  time  the  completion  of  the  nth  X- 
action  “coincides”  with  the  completion  of  the  nth  V-  action. 
If,  for  example,  pi  reaches  the  nth  X -  action  before  p2  reaches 
the  nth  Y  -  action,  the  completion  of  A  is  suspended  until  p2 
reaches  Y .  The  A-  action  is  then  said  to  be  pending. 

Probe 

Instead  of  the  usual  selection  mechanism  by  which  a  set  of  pend¬ 
ing  communication  actions  can  be  selected  for  execution,  we 
provide  a  general  Boolean  command  on  channels,  called  the 
probe.  In  process  pi,  the  probe  command  X  has  the  same 
value  as  the  predicate  “A  communication  action  Y  is  pending 
in  p2”.  __ 

Hence  the  guarded  command  X  —►  X  guarantees  that 
the  A-action  is  not  suspended.  And  a  construct  of  the  form 
[X  —♦  X  |  Y  — ►  Y]  can  be  used  for  selection. 

3.  The  “object  code” 

In  standard  digit al  VLSI  design,  the  MOS  transistor  is  ideal¬ 
ized  as  an  on/off  switch.  Unfortunately,  the  switch  model  is 
too  crude,  ignoring  too  many  electrical  phenomena  that  play 


an  important  role  in  the  functioning  of  the  circuit.  Therefore, 
trying  to  carry  the  discrete  model  of  a  computation  down  to  the 
transistor  level  is  very  likely  to  lead  either  to  incorrect  imple¬ 
mentations  or  to  a  too  complicated  model  of  the  computation. 
A  crucial  decision  in  the  developement  of  our  method  has  been 
to  choose  an  “object  code”  at  a  higher  level  than  the  transis¬ 
tor.  We  have  chosen  to  construct  a  notation  that  provides  the 
weakest  possible  form  of  control  structure  and  smallest  number 
of  program  constructs.  In  fact,  the  notation  contains  exactly 
one  construct,  the  production  rule,  and  is  therefore  called  the 
“production-rule  set  notation” . 

This  minimal  notation  has  been  chosen  so  that  i)  it  has 
sound  semantics,  ii)  any  non-terminating  program  can  be  com¬ 
piled  into  production  rules,  iii)  the  transformation  into  a  circuit 
is  straightforward. 

In  fact,  we  consider  the  production-rule  set  as  the  canonical 
representation  of  a  circuit.  This  representation  can  be  decom¬ 
posed  into  several  equivalent  networks  of  gates  depending  on 
the  set  of  building  blocks  used,  but  the  production-rule  set  rep¬ 
resents  the  circuit  independently  of  the  gate  implementations. 

4/  Production  rules 

Production  rules  can  be  seen  as  a  weaker  form  of  guarded  com¬ 
mands.  Consider  the  production  rule  G  t-*  S 

•  S  is  either  a  simple  assignment,  or  an  unordered  list  “si , 
s2,  s3 ,  ...”  of  simple  assignments,  where  a  simple  assignment 
is  the  assignment  of  true  or  false  to  a  single  Boolean  variable. 

•  G  is  a  Boolean  expression,  called  the  guard  of  the  pro¬ 
duction  rule.  If  G  holds,  the  correct  execution  of  S  is  guaran¬ 
teed  only  if  G  remains  invariantly  true  until  the  completion  of 
S .  We  say  that  G  must  be  stable . 

A  production  rule  set  is  an  unordered  set  (a  collection)  of 
production  rules.  Consider  the  canonical  production  rule  set 
PRS : 

G 1  h+S1 

G2  h*  52 

Gn  1— ►  Sn 

•  Unlike  the  guarded  commands  of  a  selection  or  a  rep¬ 
etition,  the  mutual  exclusion  among  the  different  production 
rules  of  a  set  is  not  part  of  the  semantics  of  the  construct.  The 
correct  execution  of  a  production  rule  set  is  guaranteed  only  if 
interfering  production  rules  are  mutually  exclusive.  Two  pro¬ 
duction  rules  are  said  to  be  interfering  when  their  right-hand 
sides  share  a  variable.  Each  process  will  be  implemented  as  a 
p.r.s.  such  that  exactly  one  p.r.  is  Arable  at  any  time,  hence 
enforcing  non-interference. 

•  If  stability  of  the  guards  and  mutual  exclusion  among  in¬ 

terfering  production  rules  are  guaranteed,  the  production  rule 

set  PRS  is  semantically  equivalent  to  the  non-terminating  rep¬ 
etition  *[[GCS]] ,  where  GCS  is  the  guarded  command  set  syn¬ 
tactically  identical  to  PRS.  Stability  of  the  guards  is  essential 
to  guarantee  the  absence  of  races  and  hazards.  When  stabil¬ 
ity  cannot  be  enforced,  a  special  operator  called  “synchronizer” 
has  to  be  used.  When  mutual  exclusion  cannot  be  enforced, 
a  special  operator  called  “arbiter”  has  to  be  used.  These  two 
operators  are  not  needed  in  this  paper. 

We  implement  a  p.r.s.  by  decomposing  it  into  a  collection 
of  production  rule  sets  each  of  which  has  a  known  VLSI  imple¬ 
mentation.  Those  primitive  production  rule  sets  correspond  to 
logic  gates  or  standard  VLSI  ceils  that  are  our  ultimate  building 
blocks — 


The  set  of  operators  with  which  we  want  to  build  our  cir¬ 
cuits  is  not  unique.  The  descriptions  of  the  operators  used  in 
this  paper  in  terms  of  their  production  rules  and  their  logic 
symbols  are  as  follows. 

The  “and”: 

(*.»)  4  23  X  A  y  1-*  z  f 

-II  V  — >y  Z  l  . 


The  “or”: 


(x,y)  V  *= 

A  ->y  »-»  z  | . 


The  wire: 


x  tv  y  =  x  * — ►  y  | 
->x  1-+  y  l . 


The  fork: 


*  }_  (y,z)  =  x  h->  y  t> * T 
-^x^y[,z  i. 


The  C-element: 


(x,  y)£z=  xAy*-*z] 

-n  A  ->y  z  l . 


The  asymmetric  C-element: 


(a;; y)  aC_ 


The  “flip-flop”: 


(*;  y)  ff_  *  =  *>-*  *  t 

y  *1- 

A  negated  input  or  output  is  represented  on  the  figures  by  a 
small  circle  on  the  corresponding  port.  A  wire  with  its  input 
negated  is  an  inverter.  A  cell  with  a  negated  input  is  considered 
as  one  cell,  and  not  as  the  composition  of  an  inverter  and  a  cell. 

5.  The  compilation  method 
Process  decomposition 

The  first  step  of  the  compilation,  called  “process  decomposi¬ 
tion”,  consists  in  replacing  a  process  by  several  semantically 
equivalent  processes.  The  purpose  of  the  decomposition  is  to 
obtain  a  process  representation  of  the  program  in  which  the 
right-hand  side  of  each  guarded  command  is  a  straight-line  pro¬ 
gram,  i.e.,  consists  only  of  simple  assignments  and  communica¬ 
tion  commands,  composed  by  semi-colons  and  commas.  Process 
decomposition  is  applied  repeatedly  until  the  right-hand  side  of 
each  guarded  command  is  a  straight-line  program.  Process  de¬ 
composition  plays  an  important  role  in  the  compilation  of  large 
programs.  We  won’t  need  it  in  the  example  treated  here.  See 
[5]  for  a  typical  use  of  this  transformation. 

Handshaking  expansion 

The  implementation  of  communication,  called  “handshaking  ex¬ 
pansion”  ,  replaces  each  channel  by  a  pair  of  wire-operators  and 
each  communication  action  by  its  implementation  in  terms  of  a 
“four-phase  handshaking”  protocol.  Channel  (X,y)  is  imple¬ 
mented  by  the  two  wires  (xo  w  yi)  and  (yo  w  xi). 

Initially,  xo,  xi,  yo,  and  yi  are  false.  For  a  matching  pair 
(A,F)  of  actions,  the  implementation  is  not  symmetrical  in  X 
and  Y .  One  action  is  called  active  and  the  other  one  passive. 
The  four-phase  implementation  with  X  active  and  Y  passive 
is: 


X  =  xo  j;  [**];  xo  j;  [— ix*] 


a) 


Y  =  [v1] ;  yof;  hyt'];  yoj.  (2) 

When  no  action  of  a  matching  pair  is  probed,  the  choice  of 
which  one  should  be  active  and  which  one  passive  is  arbitrary, 
but  a  choice  has  to  be  made.  The  choice  can  be  important 
for  the  composition  of  identical  circuits.  A  simple  rule  is  that 
for  a  given  channel  (A,K),  all  actions  at  one  side  are  active 
and  all  actions  at  the  other  side  are  passive.  If  X  is  used,  all 
X *  actions  are  passive — with  the  obvious  restriction  that  F 
cannot  be  used  in  the  same  program.  The  implementation  of 
the  probe  is  simply: 

X  =  xi 


A  probed  communication  action  X  — ►  . . .  X  is  implemented: 
xi  — ►  . .  .xo]\  [“»£*];  xo  J, . 


Reshuffling 

Consider  the  handshaking  expansion  of  program  p  according 
to  (1),  (2),  and  (3).  Provided  that  the  cyclic  order  of  the  four 
handshaking  actions  of  a  communication  command  is  respected, 
the  last  two  actions  of  this  command  can  be  inserted  at  any 
place  m  p  without  invalidating  the  semantics  of  the  commu¬ 
nication  involved.  However,  modifying  the  order  of  these  two 
actions  relatively  to  other  actions  of  p  may  introduce  deadlock. 
The  possibility  to  reshuffle  the  second  half  of  the  handshaking 
sequence,  plays  an  important  role  in  the  compilation  method 
as  a  source  of  algebraic  manipulations. 

Production  rule  expansion 

The  next  step  is  to  compile  the  handshaking  expansion  of  the 
program  into  a  set  of  production  rules  from  which  all  explicit 
sequencing  has  been  removed.  This  is  the  most  difficult  step  in 
particular  because  it  requires,  in  all  but  trivial  cases,  the  intro¬ 
duction  of  state  variables  to  identify  each  state  of  the  compu- 
tation  uniquely. 

Operator  reduction 

The  last  step,  called  “operator  reduction”,  consists  in  identi- 
fying  sets  of  production  rules  in  the  program  with  sets  of  pro¬ 
duction  rules  describing  operators.  The  non-trivial  part  in  this 
step  is  called  “symmetrization” .  It  is  used  for  transforming  the 
guards  of  the  production  rules  so  as  to  make  them  ‘look  like*  the 
guards  of  operators.  After  this  last  step,  the  program  has  been 
replaced  by  a  network  of  operators  for  which  standard  cells  ex¬ 
ist.  (We  have  constructed  a  cell  library  of  self- timed  elements 
m  SCMOS  technology.  Since  many  cells  are  parametrized,  the 
library  is  extendable.) 

6.  Example:  single  variable  register 

Consider  the  following  process  that  provides  read  and  write 
access  to  a  simple  boolean  variable  x : 

*l{P^P?x\Q->Q}x)}  (4) 

where  -i P  V  ->Q  holds  at  any  time,  i.e.,  read  and  write  requests 
exclude  each  other  in  time. 

Handshaking  expansion 

The  handshaking  expansion  of  (4)  uses  the  “double-rail”  tech¬ 
nique.  the  Boolean  value  of  a:  is  encoded  on  two  wires,  one 


for  the  value  true  and  one  for  the  value  false  .  Each  guarded 
command  of  (1)  is  expanded  to  two  guarded  commands: 

-»  *T;  (*];  po T;  hpi‘i];  po[ 

\p* 2-^*1;  h*];  [-7*2];  po i 

\x  A  qi  — ►  qo\  f ;  [ifi];  qox  [  (5) 

hx  A  **  *  ?°2  ti  [-»?»];  qo2  i 

]]- 


- -  —  VApOAlQiUil 

The  production-rule  expansion  of  the  first  two  guarded  com- 
mands  gives: 


pi’i  art 

P*1  A  x  t  +  po  f 

Pt2~xl 
pi  7  A  — i— ►  po  f 


-np«2 

The  first  and  fourth  p.r.’s  correspond  to  the  flip-flop: 
(p* pi i)  f[_x .  The  other  p.r.’s  can  be  transformed  into; 


(pil  A  x)  V  (pi2  A  -■x)  1 -+  po  f 
(~V*l  v  ->z)  V  (-.pij  V  x)  H->  po  l 


which  is  the  definition  of  the  IF-cell  (p«i;p«j;  x)IE.po.  This  set 
°*  P-r.’s  can  also  be  implemented  as: 


(pil, *)  A  poi 
(pi^y-ix)  A  po2 
(poi,po2)  Y  po. 

The  production-rule  expansion  of  the  last  two  guarded  com- 
mands  of  (5)  gives: 

x  A  qi  h-+  qoi  f 
-<x  V  -~>qi  qo\  J 

->x  A  qi  qo2  f 

x  V  — i^r*  qo2  I, 

which  corresponds  to  the  two  operators  (x,  qi)  A  qo\  and 
A$o2.  The  circuit  is  represented  in  Figure  1. 

7.  The  lazy  stack 

A  lazy  stack  is  one  in  which  the  full  elements,  i.e.,  the  elements 
of  the  stack  that  contain  a  piece  of  data,  are  not  necessarily 
contiguous.  For  instance,  after  a  “pop”  operation  removes  a 
data  portion  from  the  top  element  of  the  stack,  the  hole  created 
in  the  top  element  is  not  filled  even  if  some  other  element  of 
the  stack  contains  a  data  portion.  Obviously,  we  must  record 


whether  a  stack  element  is  full  or  empty.  In  the  implementation 
given  in  [3],  a  Boolean  variable  is  used  for  this  purpose.  Here  we 
shall  use  a  different  coding:  a  stack  element  is  described  as  two 
programs — one  for  the  empty  case,  one  for  the  full  case — which 
call  each  other  in  a  mutually  recursive  way. 

We  restrict  ourselves  to  Boolean  data  portions.  A  data 
portion  is  added  to  a  stack  element  by  a  command  on  the  input 
channel  “in” .  A  data  portion  is  removed  from  a  stack  element 
by  a  command  on  the  output  channel  “out” .  We  assume  that 
the  environment  never  attempts  to  add  portions  to  a  full  stack 
nor  to  remove  portions  from  an  empty  stack.  Hence  a  request 
to  remove  a  portion  from  an  empty  stack  causes  the  element  to 
obtain  the  next  data  portion  from  the  “rest  of  the  stack” .  Such 
an  action  uses  the  input  channel  “get” .  Similarly,  a  request  to 
add  a  portion  to  a  full  element  causes  the  element  to  push  the 
portion  it  contains  to  the  “rest  of  the  stack”.  Such  an  action 
uses  the  output  channel  “put” . 

The  program  for  the  empty  stack  element  is  called  E .  The 
program  for  the  full  stack  element  is  called  F .  We  have 

E  =  [in  — ►  in?x;  F  F  =  (in  — >  put\x;  intx;  F 

[out  — f  gctlx;oui\x;  E  \ out  —*■  out  he;  E  (6) 


The  initialization  of  an  empty  stack  element  is  a  call  of  E . 
The  initialization  of  a  full  stack  element  is  a  call  of  F . 

8.  Implementation  of  the  control  part 

Let  us  first  implement  the  “control  part”  of  the  program,  i.e,, 
the  programs  E  and  F  from  which  message  communication 
has  been  removed.  We  assume  that  the  stack  is  empty  initially. 
Instead  of  using  mutual  recursion,  we  use  (what  may  look  like)  a 
slightly  less  symmetrical  coding  of  (6):  we  introduce  the  channel 
(t,  tr)  and  call  F  from  within  E  by  the  usual  construction  of 
process  decomposition.  We  get 

E  =  *[[m  — >  in;  t  F  =  *[[t'  A  in  — *  put;  in 

|out  — »  get;  out  | V  A  out  — ►  out;  t'  (7) 

]]•  ]]• 

In  the  handshaking  expansion,  the  choice  of  active  and 
passive  communications  is  entirely  dictated  by  the  occurrence 
of  the  probes.  We  get 

A  ini  -*  ino  | ;  [-itni];  ino  j;  to  j;  [ft];  to  j 
| — A  outi  —*■  geto4] ;  [geti\;geto\.;  [-\geti\;  outo^ ;  [-loufi];  outoj 
]] 

Fs 

*[[ft#  A  ini  — ►  puto  f ;  \puti];puto  !;  [~>puti];ino  f ;  [-uni];  tnoj 
| ft*  A  outi  — ►  outo  f ;  [-louti];  outo  l;  to1  f ;  [— to *  j 
]]. 

9.  Compilation  of  E 

The  first  guarded  command  of  E  is  a  standard  passive-active 
buffer  element  implemented  as  an  active-active  buffer  composed 
with  a  passive-passive  adaptor  (Fig.  2. a).  The  second  guarded 


Figure  2:  The  two  guarded  commands  of  E 


Figure  3:  Implementation  of  E 

command  is  a  standard  stack  element  implemented  as  an  active- 
active  buffer  with  input  outi  inverted  (Fig.  2.b).  The  active- 
active  buffer  is  a  standard  cell  called  a  D-e lement. 

Next,  we  have  to  enforce  mutual  exclusion  between  the 
two  guarded  commands  of  E.  Since  in  and  out  are  mutually 
exclusive,  it  suffices  to  guarantee  that  when  in  is  completed 
in  the  first  guarded  command,  the  second  guarded  command 
cannot  start  until  t  is  completed.  In  order  to  strengthen  the 
guard  of  the  second  command  with  the  appropriate  expression, 
we  introduce  in  the  handshaking  expansion  of  the  first  guarded 
command  the  variable  z.  We  get 

z  A  ini  — ►  ino  f ;  zj;  [-n’no];  inoj;  to  ti  [t«];  toj;  [—»**] ;  z\ 

as  the  handshaking  expansion  of  the  first  guarded  command. 
Obviously,  it  suffices  to  strengthen  the  guard  of  the  second 
guarded  command  with  z  to  guarantee  mutual  exclusion  be¬ 
tween  the  two  g.c.’s.  We  get 

ouft* A z  — ►  geto  j ;  [geti];  getol;  \->geti};  outoj;  [-louft);  oufoj. 

Since  we  can  weaken  -iout*  as  -louft  V  ~'Z ,  the  only  transfor¬ 
mation  is  the  replacement  of  ouft  by  z  A  ouft .  This  gives  the 
circuit  of  Figure  3  as  an  implementation  of  E . 

10.  Compilation  of  F 

The  compilation  of  the  first  guarded  command  of  F  is  identi¬ 
cal  to  that  of  the  second  command  of  E ,  with  the  appropriate 
change  of  variables.  The  compilation  of  the  second  command, 
however,  can  be  drastically  simplified  by  reshuffling.  Since 
channel  (f,  t')  is  an  internal  channel,  we  can  reshuffle  the  hand¬ 
shaking  sequence  of  f1  without  deadlock.  The  handshaking  ex¬ 
pansion  of  the  second  guarded  command  becomes: 

ti1  A  ouft  — ►  oufot;  fo*  T ;  [“»fif  A  -louft];  oufoj;  to *  j 

This  sequence  compiles  immediately  into  the  C-element: 
(f*',  outi)  C_  (outo,  to1) . 


Figure  4:  The  control  part  of  stack  element 


Figure  5:  Adding  the  data  path 


The  channels  in  and  out  are  used  both  in  E  and  F ,  so 
we  need  to  merge  the  local  copies  of  in  and  the  local  copies  of 
out  in  the  standard  way.  The  resulting  circuit  for  the  control 
part  of  the  stack  element  is  shown  in  Figure  4. 

11.  Implementation  of  the  data  path 
Let  51  and  52  denote  program  (6)  and  program  (7),  respec¬ 
tively.  We  now  have  to  extend  the  implementation  of  52  so 
as  to  obtain  an  implementation  of  51.  We  want  to  leave  52 
unchanged  and  introduce  an  extra  “data  path”  process  P  such 
that  the  parallel  composition  of  52  and  P  implements  51. 
More  precisely,  the  channels  in,  outlet, put  of  52  are  renamed 
in',  out',  get1,  put1.  P  communicates  with  52  via  the  re¬ 
named  channels  and  with  the  environment  via  in, outlet, put. 
(See  Figure  5). 

By  comparing  51  and  52,  we  derive  that  P  has  to  imple¬ 
ment  the  operations: 

in'  •>  n?x 
out '•  out\x 
get 1  •  getfx 
put'*put\x 

where  A  •  B  denotes  the  simultaneous  execution  of  A  and  B . 
(We  can  define  the  completion  of  an  action  so  that  the  simul¬ 
taneous  execution  of  two  actions  is  well-defined.  The  imple¬ 
mentation  of  A  •  B  amounts  to  interleaving  the  handshaking 
sequences  of  A  and  B.) 

The  implementation  of  the  four  actions  of  P  is  based  on 
the  register  program  constructed  in  Section  6.  For  the  sake  of 


ini'  ino’  outo'  puto'  geti’ 


Figure  6:  The  data  path 

brevity,  we  omit  the  rest  of  the  derivation  which  can  be  found 
in  [8] .  The  entire  data  path  is  described  in  Figure  6. 

The  dual-port  flip-flop  used  in  the  data  path  is  defined  as: 

(8l,s2\tl,t2)  2ff_x=sl  Vs2  *-»  x  t 
U  V  t2  x  i 

(By  definition,  at  most  one  input  is  true  at  any  time.) 

12.  The  complete  circuit 

Two  important  optimizations  are  added  to  the  design.  The  first 
one  concerns  the  implementation  of  the  second  guard  of  E: 

out  — ►  get?x;  out\x. 

We  observe  that,  in  this  case,  unlike  all  other  guarded  com¬ 
mands  of  (6),  the  value  of  x  involved  in  the  second  action 
(outlx)  is  the  same  as  the  value  of  x  involved  in  the  first  ac¬ 
tion  (get?x).  We  can  therefore  encode  the  value  of  x  in  the 
handshaking  expansion  of  the  guarded  command  without  hav¬ 
ing  to  use  the  register.  The  reshuffled  handshaking  expansion 
including  the  double-rail  encoding  of  x  gives: 

— it*  A  outi  — ►  geto*]]  [getil  — outol  f  \geti2  —*  outo2f]; 
[loutt];  getoh  [->getil  — ►  outol  |  \~>geti2  — ►  outo2  JJ 

The  circuit  is 

(-iti,  outi)  A  geto 
getil  w  outol 
geti2  w  outo2 

The  second  optimization  concerns  the  implementation  of 
in'  •  tn?x,  which  is  more  complex  than  that  of  get'  •  get?x  be¬ 
cause  in  is  passive  while  get  is  active.  We  replace  in?x  and 
putlx  by  tns;in?x  and  outs;out!x,  respectively,  with  ins  pas¬ 
sive  and  in  active,  and  outs  active  and  out  passive.  For  the 
output  action  out ,  the  implementation  is  the  same  whether  the 
channel  is  active  or  passive.  The  complete  circuit  is  shown  in 
Figure  7  with  the  data  path  extended  to  four  bits. 

13.  Concluding  remarks 

By  combining  control  and  data,  the  design  of  a  lazy  stack  en¬ 
compasses  most  self-timed  design  issues  (except  for  arbitration 
which  is  treated  in  [4]  and  [5]). 

Let  us  summarize  the  main  advantages  of  the  method. 
First,  the  source  language,  in  particular  the  use  of  the  probe, 


produces  compact  and  efficient  algorithms,  which  can  be  further 
“tuned”  through  process  decomposition.  Second,  the  handshak¬ 
ing  expansion  combined  with  reshuffling  offers  powerful  alge¬ 
braic  manipulations.  Third,  the  production  rule  notation  pro¬ 
vides  a  canonical  representation  of  the  circuit  which  is  straight¬ 
forward  to  translate  in  whatever  set  of  VLSI  gates  is  available 
or  convenient  to  use.  Finally,  the  notion  of  stability  of  a  guard 
captures  exactly  the  necessary  and  sufficient  condition  to  avoid 
races  and  hazards. 

We  already  have  a  compiler  that  produces  about  the  same 
design  fully  automatically  [1).  Figure  8  shows  a  typical  layout 
produced  by  the  assembler  from  the  operator  set.  Each  operator 
has  a  standard  cell  representation.  The  cells  of  a  process  are 
stacked  to  form  a  tower  in  which  power,  reset,  and  ground  run 
vertically. 
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